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ABSTRACT 


It  is  common  in  empirical  studies 
using  nonlinear  models  to  estimate  the 
mean  response  by  evaluating  the  non¬ 
linear  response  function  at  the  mean 
value  of  its  argument(s).  However,  this 
procedure  conceptually  is  flawed  if  the 
response  function  has  significant  curva¬ 
ture  in  the  neighborhood  of  the  mean. 
Ideally,  one  should  evaluate  the  esti¬ 
mated  response  function  for  each  of  the 
estimated  responses.  In  general,  there 
will  be  some  nonzero  approximation  error 
if  one  instead  simply  evaluates  the 
response  function  at  the  mean  of  the 
independent  variable (s ) .  Furthermore, 
if  the  variability  is  significant  in  the 
independent  variable(s)  of  interest,  the 
approximation  error  of  using  the  "evalu¬ 
ate  at  the  mean"  procedure  increases . 
This  paper  examines  the  magnitude  of  the 
approximation  error,  and  attempts  to 
identify  situations  in  which  somewhat 
more  computationally  intensive  proce¬ 
dures  should  be  used. 
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INTRODUCTION 


Once  the  parameters  of  a  nonlinear  econometric  model  have  been 
estimated,  it  usually  is  of  interest  to  use  these  estimates  to  generate 
an  estimate  of  mean  response.  The  most  common  practice  is  to  compute 
the  value  of  the  function  at  the  mean  values  of  the  independent 
variables.  Although  this  procedure  works  for  linear  models,  it  is, 
strictly  speaking,  inappropriate  for  nonlinear  models.  Rather,  one 
should  compute  the  value  of  the  function  for  each  observation  and  then 
estimate  the  mean  response  by  the  mean  predicted  value  of  the  function. 

However,  adherents  of  the  "evaluate  at  the  means"  procedure  may 
fairly  ask  whether  it  makes  any  practical  difference  to  compute  the  mean 
response  in  this  somewhat  more  complicated  fashion.  This  paper 
describes  some  conditions  under  which  it  matters  how  the  mean  response 
is  calculated  and  provides  some  examples  for  these  cases. 

ALGEBRAIC  REPRESENTATION  OP  MEAN  RESPONSE 

The  choice  of  how  to  evaluate  the  mean  response  of  a  nonlinear 
function  depends  on  how  nonlinear  the  function  is  local  to  the  mean  of 
the  independent  variable  and  on  the  variability  of  the  independent 
variables.  These  considerations  can  be  made  more  explicit  with  the  aid 
of  the  following  second-order  Taylor  expansion  of  some  arbitrary 
function  about  the  mean  of  Its  argument.  Let 

F(Xl)  *  F(X)  +  F(1)(Xl  -  X)  +  |  F(2)  (X)  (XL  -  X)2  .  (1) 

Then 

|  S  F(Xi)  «  F(X)  +  |  F(2)  |  2  (XL  -  X)2  ,  (2) 

and 

|  |  Z  F(Xl)  -  F(X)  |  *  |  F(2)Var(X)  .  (3) 


1.  Obviously,  if  the  function  in  question  is  easy  to  evaluate,  there  is 
not  much  excuse  for  using  the  mean  values  of  the  independent  variables. 
However,  if  function  evaluation  is  cumbersome,  this  is  a  more  legitimate 
concern . 

2.  Functions  with  vector  arguments  present  no  special  difficulties  to 
the  following  discussion;  a  scalar  argument  is  used  purely  for 
expository  convenience. 
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What  determines  the  value  of  equation  3,  which  represents  the  difference 
in  the  two  methods  for  computing  mean  response?  First,  there  is  the 
curvature  of  the  function  F  local  to  the  mean  of  its  argument.  If  F  is 
linear,  then  F^  '  is  zero  and  the  two  methods  trivially  are  identical. 
If,  on  the  other  hand,  F  is  strongly  curved  in  the  neighborhood  of  X, 
then  there  can  be  serious  error  in  using  F(X ) . 


Second,  even^if  F  is  strongly  curved,  if  Var(X)  is  small  the  error 
in  approximating  —  £  F(X^)  with  F(X)  will  be  also.  With  a  large 
Var(X) ,  even  small  amounts  of  curvature  in  the  function  can  create 
significant  error.  The  intuition  behind  this  observation  is  rather 
obvious :  if  the  range  of  variation  in  X  is  large ,  the  chord  connecting 

extreme  values  of  X  can  lie  farther  away  from  the  function  F. 


These  conditions  can  be  made  more  explicit  with  the  aid  of  a  common 
nonlinear  statistical  model  for  discrete  outcomes- -the  binary  logit 
model.  The  binary  logit  model  may  be  specified  as  follows: 


7  “  Xfi  +  e 


(4) 


q 


'  1  if  y  <  0 

0  otherwise 


P(q  -  k)  -  F(y),  K  -  {0,1}  , 


where  F(y)  is  the  logistic  distribution  function,  y  is  a  continuous 
variable,  and  q  is  a  binary  indicator  variable. 

Figure  1  shows  the  cumulative  distribution  and  density  functions 
(F(X)  and  f(x),  respectively)  for  the  logistic  distribution.  From 
figure  1,  it  is  obvious  that  the  curvature  of  the  function  F(X)  is 
smallest  when  X  is  near  zero  and  for  extreme  values  of  X.  Thus,  the 
curvature  of  F(X)  is  smallest  when  F(X)  is  relatively  small,  close  to 
0.5,  or  relatively  large. 


1.  A  third  factor,  which  is  more  subtle  than  the  first  two  mentioned,  is 
that  the  curvature  of  F  is  estimated  with  an  error  that  depends  on, 
among  other  things,  the  degree  of  "smoothness"  in  the  data.  If  the  X 
variables  are  highly  clustered,  the  estimate  of  the  parameters  of  F,  and 
therefore  of  F'^',  may  be  sensitive  to  slight  perturbations  in  X.  If 
the  population  is  substantially  homogeneous  in  its  measured 
characteristics  (i.e.,  the  independent  variables),  errors  from  using  the 
mean  value  of  independent  variables  should  be  minimized. 
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Figure  1 .  Logistic  probability  density  and  distribution 


Figure  2  graphs  (X)  as  a  function  of  F(X).  When  F(X)  lies  in 

the  ranges  0.07  to  0.35  and  0.65  to  0.93,  the  distribution  function  has 
significant  curvature.  Therefore,  except  for  empirical  situations  in 
which  the  mean  response  is  less  than  about  0.07,  between  0.35  and  0.65, 
or  greater  than  0.93,  curvature  may  create  difficulties  in 
approximating  —  2  F (X)  by  F(X). 


Figure  2.  Curvature  of  logistic  probability  distribution 
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As  shown  in  figure  3,  the  actual  error  in  approximation  depends  on 
the  variance  of  the  right -hand -side  variables  as  well  as  the  curvature 
of  the  distribution  function  local  to  the  mean  response.  Figure  3  shows 
how  the  approximation  error  is  related  to  the  variance  (the  variance  of 
the  argument  of  the  distribution  in  logit  models  generally  is  less  than 
anity) .  The  effect  of  smaller  variance  essentially  is  to  compress  the 
curvature  function  presented  in  figure  2. 


Figure  3.  Approximation  error  (logistic  distribution) 


The  following  general  observations  may  be  made  regarding  the  error 
associated  with  estimating  the  mean  response  using  the  mean  value  of  the 
independent  variables. 

•  In  most  standard  probability  models,  if  the  mean  of  the 
dependent  variable  lies  near  0.5,  the  cumulative 
distribution  function  is  essentially  linear,  and  the  error 
is  close  to  zero . 

•  Similarly,  in  these  probability  models  one  need  be  less 
cautious  when  dealing  with  very  likely  or  very  unlikely 
events . 

•  However,  in  many  empirical  situations,  the  mean  response 
may  be  0.1  to  0.3  or  0.7  to  0.9,  which  are  ranges  in  which 
the  approximation  error  is  greatest. 

•  Variation  in  the  magnitude  of  the  variance  of  the 
independent  variables  does  not  change  the  dependence  of 
the  approximation  error  on  the  curvature  of  F(X),  but  does 
affect  its  amplitude. 
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EMPIRICAL  EXAMPLES 


Mean  Response  in  Region  of  Significant  Curvature 

In  a  recent  study  of  retention  among  Marines,^"  the  intention  to 
reenlist  was  modeled  using  a  binary  logit  specification  in  which  a 
continuous  variable  measured  propensity  to  remain  in  the  military,  and  a 
binary  variable  indicated  intention  to  quit.  Table  1  presents  the 
relevant  results . 


Table  1.  Estimates  of  logit  model  of  USMC  retention 


Variable 

Parameter 

estimate 

Standard 

error 

Mean  of 
variable 

Overall  satisfaction  with 
military  life  (scaled) 

-0.539 

0.066 

1.4 

Satisfaction  with  family 
services  (scaled) 

-0.014 

0.014 

3.3 

Perceived  chances  of  good 
civilian  job  (scaled) 

0.065 

0.022 

8.6 

Length  of  service  (yr) 

0.061 

0.021 

9.0 

Whether  married 

-0.323 

0.134 

0.7 

Whether  any  minor  children 

-0.699 

0.135 

0.5 

Age 

-0.018 

0.020 

28.1 

Intercept 

1.874 

0.478 

Var(Xp) 

0.536 

|  2  m^) 

0.335 

1.  CNA  Report  139,  A  Study  of  Marine  Corps  Family  Programs ,  by  Edward  S. 
Cavin,  September  1987. 
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Using  these  results,  one  can  calculate  the  error  in  approximating 
the  mean  response  by  the  distribution  function  evaluated  at  the  mean  of 
the  independent  variables: 


|  |  2  F(X)  -  F(l)  |  -  |  0.335  -  0.318  |  -  0.017  . 


Thus,  the  error  is  approximately  2  percentage  points.  Since  the 
logistic  distribution  function  is  significantly  curved  in  the 
neighborhood  of  the  mean  of  the  independent  variables  (i.e.,  -0.7627), 
the  error  caused  by  using  the  approximation  of  evaluating  the 
distribution  function  at  the  mean  is  relatively  large.  (It  would  be 
somewhat  larger,  but  for  the  relatively  small  variance  of  the  X^/3.) 

Mean  Response  in  Region  of  Little  Curvature 

A  very  different  issue  recently  was  addressed  using  a  similar  kind 
of  binary  logit  model,  namely  the  failure  rate  of  certain  avionics  on 
U.S.  Navy  aircraft.  These  avionics  monitor  the  status  of  the  weapon 
systems  onboard  the  aircraft,  and  can  be  tested  using  alternative 
automated  testing  procedures.  For  the  purposes  of  this  paper,  however, 
the  important  point  is  that  the  failure  rate  per  flight  is  very  low 
(less  than  1  percent). 

Table  2  presents  the  relevant  results  from  this  analysis. 

As  with  the  previous  example,  one  can  calculate  the  error  in 
approximating  the  mean  response  by  the  distribution  function  evaluated 
at  the  mean  of  the  independent  variables : 

|  ^  2  F(X)  -  F(X)  |  -  |  0.0070  -  0.0045  |  -  0.0025  . 


In  this  case,  the  error  is  less  than  0.3  percent.  This  is  because  the 
logistic  distribution  has  very  little  curvature  in  the  neighborhood  of 
the  mean  of  the  independent  variables  (i.e.,  -5.392).  (The  error  would 
be  a  little  larger  if  the  variance  were  somewhat  larger  in  this 
example. ) 


1.  The  curvature  of  the  distribution  function  at  X  -  -0.7627  is: 

F(2'  -  0.078. 

2.  Example  is  from  CNA  Research  Memorandum  88-175,  A  Comparison  of  the 
Repair  Times  and  Effectiveness  of  ATS  and  IAFTA,  by  Robert  A.  Levy  and 
Beverly  Spejewski,  January  1989. 

3.  The  curvature  of  the  distribution  function  at  X  -  -5.392  is: 

F(2'  -  0.004. 
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Table  2.  Estimates  of  logit  model  of  avionics  failure 


Variable 

Parameter 

estimate 

Standard 

error 

Mean  of 
variable 

Whether  new  test 
procedure  used 

0.396 

0.510 

0.199 

Whether  new  procedure 
indicated  failure 

0.852 

0.461 

0.069 

Whether  aircraft  based 
at  airfield  1 

0.119 

0.439 

0.541 

Whether  aircraft  based 
at  airfield  2 

-0.627 

0.734 

0.095 

Whether  aircraft  based 
at  airfield  3 

-0.832 

0.710 

0.245 

Number  of  flights  since 
last  failure 

-0.028 

0.006 

90.2 

(Number  of  flights  since 
last  failure)  2 

0.8xl0*4 

0 . 2x10' 4 

14,114 

Days  since  last  flight 

0.016 

0.006 

1.63 

Intercept 

-4.022 

0.408 

Var  (Xfi) 

0.313 

|  2  F(X^) 

0.007 

CONCLUSION 

These  examples  illustrate  that  the  error  in  approximating  the  mean 
response  in  a  nonlinear  model  by  evaluating  the  nonlinear  function  at 
the  mean  of  the  independent  variables  is  positively  related  to  the 
degree  of  nonlinearity  in  the  function.  Given  the  ease  with  which 
nonlinear  functions  can  be  evaluated  for  each  observation,  and  then 
averaged,  the  shortcut  of  function  "at  the  means"  should  not  be  used 
unless  one  is  certain  that  the  function  is  locally  linear. 
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